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Abstract 

The method of defensive forecasting is applied to the problem of pre- 
diction with expert advice for binary outcomes. It turns out that defensive 
forecasting is not only competitive with the Aggregating Algorithm but 
also handles the case of "second-guessing" experts, whose advice depends 
on the learner's prediction; this paper assumes that the dependence on 
the learner's prediction is continuous. 



1 Introduction 

There are many known techniques in competitive on-line prediction, such as fol- 
lowing the perturbed leader (see, e.g., [H Q31 [12] ) , Bayes-type aggregation (see, 
e.g., [Ill GIB IS]) and the closely related potential methods, gradient descent 
(see, e.g., [2]) and closely related exponentiated gradient descent [14], and the 
recently developed technique of defensive forecasting (see, e.g., [271 [24]). Defen- 
sive forecasting combines the ideas of game-theoretic probability (see, e.g., [T5] ) 
with Levin and Gacs's ideas of neutral measure [TBI EZ] and Foster and Vohra's 
ideas of universal calibration [5]. See [3] for a general review of competitive 
on-line prediction. 

This paper applies the technique of defensive forecasting to prediction with 
expert advice in the simple case of binary outcomes The learner's goal in pre- 
diction with expert advice is to compete with free agents, called experts, who 
are allowed to choose any predictions at each step. We will be interested in 
performance guarantees of the type 

L N < min L% + a K (1) 
k=X,...,K 

where K is the number of experts, aK is a constant depending on K, Lm is 
the learner's cumulative loss over the first N steps, and L k N is the fcth expert's 
cumulative loss over the first N steps (see §[j3]-[5]for precise definitions). 
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It has been shown by Watkins f[22|. Theorem 8) that the Aggregating Algo- 
rithm (implementing Bayes-type aggregation for general loss functions [20l [21] , 
the AA for short) delivers the optimal value of the constant clk in fl} whenever 
the goal (P) can be achieved. (Watkins's result was based on earlier results by 
Haussler, Kivinen, and Warmuth [TO], Theorem 3.1, and Vovk [31], Theorem 
1, establishing the optimality of the AA for a large number of experts.) Theo- 
rem [3] of this paper asserts that, perhaps surprisingly, defensive forecasting also 
achieves the same performance guarantee. 

Whether the goal (UJ is achievable depends on the loss function used for 
evaluating the learner's and experts' performance. The necessary and sufficient 
condition is that the loss function should "perfectly mixable" (see S}5] for a defi- 
nition). For simplicity, we first consider two specific, perhaps most important, 
examples of perfectly mixable loss functions: the quadratic loss function in fj3] 
and the log loss function in $4] Those two sections are self-contained in that 
they do not require familiarity with the AA. In the last section, |J5] we establish 
the general result, for arbitrary perfectly mixable loss functions. In an appendix 
we state Watkins's theorem in the form needed in this paper. 

It is interesting that the technique of defensive forecasting is also applicable 
to experts who are allowed to "second-guess" the learner: their recommen- 
dations can depend (in a continuous manner in this paper) on the learner's 
prediction. It is not clear that second-guessing experts can be handled at all by 
the AA. 

A result similar to this paper's results is proved by Stoltz and Lugosi in [19], 
Theorem 14 (a more detailed comparison will be given in [25]). Second-guessing 
experts are useful in game theory (where competing with second-guessing ex- 
perts is known as prediction with a small internal regret). For a more down- 
to-earth example of a useful second-guessing expert, remember that humans 
tend to give too categorical (i.e., close to or 1) predictions; therefore, a useful 
second-guessing expert for a human learner would transform his/her predic- 
tions to less categorical ones (according to the learner's expected calibration 
curve [3]). 

2 Defensive forecasting 

Let E be a topological space (E — [0, 1] K in the application to prediction with 
expert advice in §H3HS]). 

The binary forecasting protocol 
/Co := 1. 

FOR n = 1,2,...: 

Expert announces continuous j n : [0, 1] — > E. 

Forecaster announces p n £ [0, 1]. 

Reality announces uj n 6 {0, 1}. 
END FOR. 
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A process is any function S : (E x [0, 1] x {0, 1})* — > M. Given the sequence 
of the players' moves in the binary forecasting protocol, we sometimes write 
S N , N e {0,1, .. .}, for S (7i(pi),pi,o;i, . . . , Jn(pn),Pn, w n ). (Notice that S N 
depend on 7„ only via r y n {Pn)-) We also sometimes interpret Sn as function of 
the players' moves in the protocol and identify the process S with the sequence 
of functions Sn, N = 0, 1, . . ., on the set of all histories (71, pi, u>x, 72)P2> w 2, • • •)■ 
A process S is said to be a supermartingale if it is always true that 

PnS (gi,pi,u>x, . . . ,gN-l,PN-l,WN-l,gN,PN, 1) 

+ (1 —pn)S (gi,pi,u>x, . . . ,gN-i,PN-i,^N-x,9N,PN,0) 

< S (gi,pi,wi, . . .,g N -i,PN-i,u N -i) (2) 

(i.e., it is true for all N, all gi,...,gN in E, all pi, . . . ,pn i n [0,1], and 
all bj\, . . . , u>n-i i n {0,1})- In the traditional theory of martingales (when 
translated into our framework), Expert's move is an element of E (in other 
words, a constant function), and this would be sufficient for application to 
the traditional problem of prediction with expert advice; however, the ver- 
sion with second-guessing experts requires the generalization to 7„ : [0, 1] — > 
E. We say that a supermartingale S is forecast- continuous if, for each N, 
S ((71 , pi , u>i , . . . , gN,PN, wjv) is a continuous function of pn S [0, 1] and gN £ E. 

Lemma 1 (Levin, Takemura) For any forecast- continuous supermartingale 
S there exists a strategy for Forecaster ensuring that So > Si > • • • regardless 
of the other players ' moves. 

Proof Set, for p 6 [0, 1] and u G {0, 1}, 

t(w,p) ■■= S (7i(pi),pi,a;i, . . . , j N -i(pn-i),Pn-i, v N -i, jn(p),P, w) 

- 5* (7i(Pi))Pi>^i: ■ ■ - ,7Ar-i(Piv-i),PAr-i,wjv_i) . 

Our goal is to prove the existence of p such that t(ui,p) < for both u> = 
and w = 1. I will give an argument (from |24j . the proof of Lemma 1) that is 
applicable very generally. 
For all p, q G [0, 1] set 

0(g,p) :=gi(l,p) + (l- 9 )i(0,p). 

The function <j>(q,p) is linear in its first argument, q, and continuous in its second 
argument, p. Ky Fan's minimax theorem (see, e.g., pQ, Theorem 11.4) shows 
that there exists p* £ [0, 1] such that 

Vge [0,1] : 4>(q,p*) < sup 0(p,p). 

pe[o,i] 

Therefore, 

Vge [0,1] : gt(l,p*) + (l-g)i(0,p*)<0, 
and we can see that t(iv,p*) never exceeds 0. I 
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For generalizations (due to Levin and Takemura) of Lemma [T] in different 
directions, see, e.g., [26] (Theorem 1) and [24] (Lemma 1). By defensive fore- 
casting we mean using such results in prediction with expert advice. 

3 Algorithm competitive with continuous second- 
guessers: quadratic loss function 

This is the version of the standard protocol of prediction with expert advice 
under quadratic loss for continuous second-guessing experts: 

Prediction with expert advice under quadratic loss 
L Q := 0. 

L$:=Q,k = l,...,K. 
FOR n = 1,2,...: 

Expert k announces continuous 7^ : [0, 1] — > [0, 1], k = 1, . . . , K. 
Learner announces p n G [0, 1]. 
Reality announces uj n G {0, 1}. 
L n := L„_i + (p„ - uj) 2 . 
Ll + ( 7 £(p„)-0 2 . 

END FOR. 

To apply Lemma [1] to the problem of prediction with expert advice under 
quadratic loss, we will need the following result. 

Lemma 2 Suppose E = [0, 1] and k g [0, 2]. The process 

( N 

S N := CXp I K ^ f(Pn - ^nf - (ln(Pn) - U n ) 
\ n=l 

is a supermartingale in the binary forecasting protocol. 
Proof By @, it suffices to check that 

P exp( K ((p-l) 2 -( ff -l) 2 )) +(l-p)exp(«((p-0) 2 -( 5 -0) 2 )) < 1 

for all p, g G [0, 1]. If we substitute g = p + x, the last inequality will reduce to 

The last inequality is a simple corollary of Hoeffding's inequality ([11]. (4.16), 
which is true for any ligl: cf. [3], Lemma A.l). Indeed, applying Hoeffding's 
inequality to the random variable 




1 with probability p 

with probability 1 — p, 
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we obtain 

pe h(l- P ) + {1 _ p)e -hp< e h*/8 ) 

which the substitution h := 2nx reduces to 

the last inequality assuming k < 2. I 

Lemma [2] immediately implies a performance guarantee for the method of 
defensive forecasting. 

Theorem 1 There exists a strategy for Learner in the quadratic-loss protocol 
with K experts that guarantees 

r. In if 

L N < L% + — (3) 
for all N =1,2,... and all k £ {1,.. .,!<}. 

Proof Consider the binary forecasting protocol with E = [0,1]^. By LemmaHJ 
the process 

K / N \ 

eX P [ K Y {( Pn ~ ~ hn(Pn) - W„) 2 ) J 

fe=X V n=l / 

is a supermartingale. By Lemma [TJ Learner has a strategy that prevents this 
supermartingale from growing. This strategy ensures 



k / n \ 

E ex P K E ((P« ~ W «) 2 ~ (7n(Pn) ~ ^n) 2 ) ) 
fe=l V n=l ) 



<K, 



which implies, for all k £ {1, ... , K}, 

cxp Y ((Pn _ w «) 2 - (7n(Pn) _ w «) 2 )^ ^ X > 

i.e., ([3]) in the case n = 2. I 

For the proof of ([3]) being the performance guarantee for the AA, see, e.g., 
[20 , Example 4, or 23J, §2.4. It is interesting that even such an apparently 
minor deviation from the AA as replacing the AA-type averaging of the ex- 
perts' predictions by the arithmetic mean (with the same exponential weighting 
scheme) leads to a suboptimal result: the constant 2 in ([3|) is replaced by 1/2 
([T5]. reproduced in [23], Remark 3). 
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Algorithm competitive with continuous second- 
guessers: log loss function 



The log loss function is defined by 
A(w,p) :=i" 



Inp if w = 1 

In(l-p) ifw = 0, 



where ui G {0, 1} and p G [0, 1]; notice that the loss function is now allowed to 
take value oo. The protocol of prediction with expert advice becomes: 



Prediction with expert advice under log loss 
L := 0. 

L§:=0,k=l,...,K. 
FOR n — 1,2,...: 

Expert k announces continuous 7^ : [0, 1] — > [0, 1], k = 1, . . . , K. 

Learner announces p n G [0,1]. 

Reality announces u> n G {0, 1}. 

n •> Pn ) • 

END FOR. 



This is the analogue of Lemma [2] for the log loss function: 
Lemma 3 Suppose E = [0, 1] and k G [0, 1]. The process 

S N := exp ^kJ^(a (w„,p„) - A (w„, 7„(p n )) jj 

is a supermartingale in the binary forecasting protocol. 
Proof It suffices to check that 

pexp (k (— \np + ln<?)) + (1 — p) exp (k (— ln(l — p) + ln(l — g))) < 1, 
i.e., that 

P 1 - K g* + 0—p) 1 - K (l-g)"<l, 

for all p,g G [0, 1]. The last inequality immediately follows from the inequality 
between the geometric and arithmetic means when k G [0,1]. (The left-hand 
side of that inequality is a special case of what is known as the Hellinger integral 
in probability theory.) I 

Lemma [3] implies a performance guarantee for the log loss function as in the 
previous section. 
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Theorem 2 There exists a strategy for Learner in the log loss protocol with K 
experts that guarantees 

L N <L h N + \nK (4) 
for all N =1,2,... and all k S {1,.. .,!<}. 

Proof Take k := 1. Lemma [3] guarantees that the process 

k / n \ 
^exp I k^](A(w„,p„) - A (w n ,7^(p n ))) J 

fe=l V n=l / 

is a supermartingale in the binary forecasting protocol with E = [0, l] K . Any 
strategy for Learner that prevents this supermartingale from growing ensures 
© for all k £ {1,..., K}. I 

For the proof of (H]) being the performance guarantee for the AA, see, e.g., 
[20] . Example 3. 

5 Algorithm competitive with continuous second- 
guessers: perfectly mixable loss functions 

In this section we assume that Learner chooses his predictions from a non-empty 
decision space F and that his performance is evaluated using a loss function 
A : {0, 1} x r — > R. The triple ({0, l},r, A) will sometimes be called our game 
of prediction (the first element, the outcome space {0,1}, is redundant at this 
time). The loss function will be assumed bounded below; there is no further 
loss of generality in assuming that it is non-negative. 

As mentioned in <Jl] to have a chance of achieving ((T|) , the loss function has to 
be assumed "perfectly mixable" (this will be further discussed in the appendix) ; 
we start from defining this property. 

A point (x, y) of the plane R 2 is called a superprediction (with respect to 
the loss function A) if there exists a decision 7 6 T such that 

A(0, 7 )<o; & A(l,7)<y. 

Our next assumption about the game of prediction will be that the superpre- 
diction set is closed. 

Let 77 be a positive constant (the learning rate used) . A shift of the curve 
{{x, y) I e~ 7 ' x + e-w = 1} in R 2 is the curve {(x, y) | e -^ x+a '> + e^^^ = 1} 
for some a,@ € R (i.e., it is a parallel translation of e~ 1);r + e~ ,,y = 1 in any 
direction and by any distance). The loss function is called rj-mixable if for each 
point (a, b) on the boundary of the superprediction set there exists a shift of 
e ~r)x e -w _ 1 p ass i n g through (a, b) such that the superprediction set lies 
completely to one side of the shift (it is clear that in this case the superprediction 
set must lie to the Northeast of the shift). The loss function is perfectly mixable 
if it is 77 mixable for some 77 > 0. 
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Suppose A is 77-mixable, 77 > 0. Each decision 7 G T can be represented by the 
point (A(0, 7), A(l, 7)) in the superprediction set. The set of all (A(0,7), A(l, 7)), 
7 G T, will be called the prediction set; for typical games this set coincides with 
the boundary of the superprediction set. As far as the attainable performance 
guarantees are concerned (before we start paying attention to computational 
issues), the only interesting part of the game of prediction is its prediction 
set; the game itself can be regarded as an arbitrary coordinate system in the 
prediction set. It will be convenient to introduce another coordinate system in 
essentially the same set. 

For each p G [0, 1], let (a p , b p ) be the point (x, y) in the superprediction set 
at which the minimum of py + (1 — p)x is attained. Since A is ?7-mixable, the 
point (a p , bp) is determined uniquely; it is clear that the dependence of (a p ,b p ) 
on p is continuous. 

We can now redefine the decision space and the loss function as follows: the 
decision space becomes [0, 1] and the loss function becomes 



The resulting game of prediction is essentially the same as the original game 
(one of the minor differences is that, if the superprediction set has "corners", a 
decision 7 6 T maybe split into several decisions p G [0, 1] in the new game, all 
leading to the same losses). In the rest of this section, let us assume that the 
game of prediction has been transformed to this standard form. Notice that the 
new loss function is a "proper scoring rule" (see, e.g., [4]). 

The protocol of this section formally coincides with that of the previous 
section (although A ranges over a much wider class of loss functions): 

Prediction with expert advice in a standard perfectly mixable game 
L := 0. 

L$:=Q,k = l,...,K. 
FOR n = 1,2,...: 

Expert k announces continuous 7^ : [0, 1] — > [0, 1], k = 1, . . . , K. 
Learner announces p n G [0,1]. 
Reality announces u) n G {0, 1}. 
L n := L„_x + \(uj n ,p n ). 



Lemmas [2] and [3] carry over to the perfectly mixable loss functions: 

Lemma 4 Let rj > 0, ({0, 1}, [0, 1], A) be a standard ij-mixable game of predic- 
tion, E = [0, 1], and k G [0, 77]. The process 



X(Q,p) : 



A(l,p) :=b. 




+ A(w„,7^(p„)). 



END FOR. 




is a supermartingale in the binary forecasting protocol. 
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Proof It suffices to check that 

pexp (k (A(l,p) - A(l, g))) + (1 - p) exp (« (A(0,p) - A(0, <?))) < 1 

for all p,g S [0, 1]. As A is 77-mixable, it will also be At-mixable; we will only be 
using the latter property. Using the notation (a, b) := (a p , b p ) = (A(0,p), A(l,p)) 
and {a 1 ,b') :— (a g ,b g ) = (A(0, g), A(l, <?)), we can slightly simplify this inequal- 
ity: 

pexp (k (b - b')) + (1 - p) exp (k (a - a')) < 1. (5) 

It is clear that the superprediction set lies to the Northeast of the shift 
e -n(x+a) + e -K(y+p) _ j f e -«z + e -Ky = x that p asses through (a, 6), 

and has the tangent at (a, 6) orthogonal to (1 — p,p), 

-K{x +a )^_ Ke -K{v+p)\ K (l-p jP ) (7) 

/ x:—a^y:—b 



-Ate 



(the expression on the left-hand side is the gradient of e K ( ;E + Q! ) -f e at 
(a, 6)). We can see from © and that 

e- K{a+a) = 1 - p, e - K{b+l3} = p. 

Substituting these values for p and 1 — p in ([5]). we transform (JSJ) to 

e -K(6'+/3) +g -K(a'+Q) < ^ 

which is true: the last inequality just says that (a 1 , b') is Northeast of the shift. 



Theorem 3 Let rj > and consider any standard rj-mixable game of prediction 
({0, 1}, [0, 1], A). There exists a strategy for Learner in the prediction protocol 
with K experts that guarantees 

L N <L% + — (8) 
V 

for all N =1,2,... and all k S {1, . . .,!<}. 

Proof Take n :=r\ and proceed as in the proof of Theorem [2] (using Lemma [4] 
instead of Lemma 13]). I 

Inequality ijSJ) as the performance guarantee for the AA is derived in [2"U] . 
Theorem 1. 
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Appendix: Watkins's theorem 



Watkins's theorem is stated in [22] (Theorem 8) not in sufficient generality: it 
presupposes that the loss function is perfectly mixable. The proof, however, 
shows that this assumption is irrelevant (it can be made part of the conclusion), 
and the goal of this appendix is to give a self-contained statement of a suitable 
version of the theorem. 

By a game of prediction we now mean a triple (f2, T, A), where Q and T are 
sets called the outcome and decision space, respectively, and A : x T — > M 
is called the loss function (R is the extended real line R U {—00,00} with the 
standard topology, although the value —00 will be later disallowed). 

Partly following [21], for each K = 1,2,... and each a > we consider 
the following perfect-information game Qk (a) (the "global game" ) between two 
players, Learner and Environment. 

Global game Gx(a) 
L Q := 0. 

L%:=0,k=l,...,K. 
FOR n — 1,2,...: 

Environment chooses 7* 6 T, k = 1, . . . , K. 

Learner chooses 7„ G V. 

Environment chooses u) n G Q. 

L„ := L n -i + A(w„, 7„). 

L*:=L k n _ 1 + \(u n ,ri),k = l,...,K. 
END FOR. 

Learner wins if, for all N = 1, 2, . . . and all k G {1, . . . , K}, 

L N <L k N + a; (9) 

otherwise, Environment wins. 

It is possible that Ln = 00 or = 00 in ([9]); the interpretation of inequalities 
involving infinities is natural. 

For each K we will be interested in the set of those a > for which Learner 
has a winning strategy in the game Gk(o) (we will denote this by L ^ Qk{o))- 
It is obvious that 

L — G K (a) & a' > a =*> L — Q K {p!)\ 

therefore, for each K there exists a unique borderline value ax such that L -—^ 
Qk ifl) holds when a > ok and fails when a < ax- It is possible that ax = 00 
(but remember that we are only interested in finite values of a). 

These are our assumptions about the game of prediction (similar to those in 

my- 

• r is a compact topological space; 
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• for each u> £ CI, the function 7 £ T 1— > A(w, 7) is continuous; 

• there exists 7 e T such that, for all us £ CI, A(ui, 7) < 00; 

• the function A is bounded below. 

We say that the game of prediction (CI, T, A) is n-mixable, where rj > 0, if 

V71 6 r, 72 g r, a e [o, 1] 36 e r e ft : 

e -7,A(o;,5) > ae - n A(o;, 7l ) + ^ _ 0^-^,72) _ ( 1Q ) 

In the binary case, CI = {0, 1}, this condition says that the image of the super- 
prediction set under the mapping (x, y) 1— * (e~ r,x , e~ w ) is convex, and it is easy 
to see that it is equivalent to the definition used in Sj5] 

It follows from [9] (Theorem 92, applied to the means 9Jl<p with (f>(x) = e~ vx ) 
that if the prediction game is ?7-mixable it will remain f/-mixable for any positive 
77' < rj. (For another proof, see the end of the proof of Lemma 9 in [21].) Let 
rj* be the supremum of the rj for which the prediction game is ?7-mixable (with 
if := when the game is not perfectly mixable). The compactness of T implies 
that the prediction game is ?7*-mixable. 

Theorem 4 (Chris Watkins) For any K £ {1, 2, . . .}, 



In particular, a^ < 00 if and only if the game is perfectly mixable. 

It is easy to see that L Gk{<ik)' this follows both from general considerations 
(cf. Lemma 3 in [5T]) and from the fact that the AA and this paper's algorithm 
based on defensive forecasting (the latter assuming CI = {0, 1}) win Qk^k) — 
g K {hxK/r,*). 

Proof of Theorem [4] The proof will use Theorem 1 of [21] . Without loss 
of generality we can, and will, assume A > 1 (add a suitable constant to A if 
needed); therefore, Assumption 4 of [3T] (the only assumption in [21] not directly 
made in this paper) is satisfied. In view of the fact that L ■-— - fe(ln K/rf), 
we only need to show that L ^ Qk{o) does not hold for a < \nK/rj*. Fix 
a < \nK/n*. 

Since the two-fold convex mixture in (JTUJ) can be replaced by any finite convex 
mixture (apply two-fold mixtures repeatedly), the point (1, l/rj*) belongs to the 
separation curve (set (3 := e~ n in the definition of c(/3)) whereas the point 
(1, a I In K) is Southwest and outside of the separation curve (use Lemmas 8- 
12 of [21]). Therefore, E (^Environment) has a winning strategy in the game 
Q(l, a I \nK), as defined in [21] , It is easy to see from the proof of Theorem 1 in 
[2Tj that the definition of the game Q in [3T] can be modified, without changing 
the conclusion about Q(l, a/ In if), by replacing the line 

E chooses n > 1 {size of the pool} 
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in the protocol on p. 153 of [H] by 

E chooses n* > 1 {lower bound on the size of the pool} 
L chooses n > n* {size of the pool} 

(indeed, the proof in §6 of [H] only requires that there should be sufficiently 
many experts). Let n* be the first move by Environment according to her 
winning strategy. 

Now suppose L ^ Gk{o). From the fact that there exists Learner's strategy 
£i winning Gk(ci) we can deduce: there exists Learner's strategy £2 winning 
Q K i (2a) (we can split the K 2 experts into K groups of K, merge the experts' 
decisions in every group with £1, and finally merge the groups' decisions with 
£1); there exists Learner's strategy £3 winning Gk 3 ^ ) ( we can split the K 3 
experts into K groups of K 2 , merge the experts' decisions in every group with 
£2, and finally merge the groups' decisions with £1); and so on. When the 
number K m of experts exceeds n*, we obtain a contradiction: Learner can 
guarantee 

Ln < L k N + ma 

for all N and all K m experts k, and Environment can guarantee that 

Ln > L% + MK m ) =L k N + ma 
ln K 

for some N and k. I 
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